Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm.
نویسندگان
چکیده
Water-mediated ligand interactions are essential to biological processes, from product displacement in thymidylate synthase to DNA recognition by Trp repressor, yet the structural chemistry influencing whether bound water is displaced or participates in ligand binding is not well characterized. Consolv, employing a hybrid k-nearest-neighbors classifier/genetic algorithm, predicts bound water molecules conserved between free and ligand-bound protein structures by examining the environment of each water molecule in the free structure. Four environmental features are used: the water molecule's crystallographic temperature factor, the number of hydrogen bonds between the water molecule and protein, and the density and hydrophilicity of neighboring protein atoms. After training on 13 non-homologous proteins, Consolv predicted the conservation of active-site water molecules upon ligand binding with 75% accuracy (Matthews coefficient Cm = 0.41) for seven new proteins. Mispredictions typically involved water molecules predicted to be conserved that were displaced by a polar ligand atom, indicating that Consolv correctly assesses polar binding sites; 90% accuracy (Cm = 0.78) was achieved for predicting conserved active-site water or polar ligand atom binding. Consolv thus provides an accurate means for optimizing ligand design by identifying sites favored to be occupied by either a mediating water molecule or a polar ligand atom, as well as water molecules likely to be displaced by the ligand. Accuracy for predicting first-shell water conservation between independently determined structures was 61% (Cm=0.23). The ability to predict water-mediated and polar interactions from the free protein structure indicates the surprising extent to which the conservation or displacement of active-site bound water is independent of the ligand, and shows that the protein micro-environment of each water molecule is the dominant influence.
منابع مشابه
A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملA New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملDiagnosis of Heart Disease Using Binary Grasshopper Optimization Algorithm and K-Nearest Neighbors
Introduction: The heart is one of the main organs of the human body, and its unhealthiness is an important factor in human mortality. Heart disease may be asymptomatic, but medical tests can predict and diagnose it. Diagnosis of heart disease requires extensive experience of specialist physicians. The aim of this study is to help physicians diagnose heart disease based on hybrid Binary Grasshop...
متن کاملAn Algorithm for Predicting Recurrence of Breast Cancer Using Genetic Algorithm and Nearest Neighbor Algorithm
Introduction: Breast cancer is one of the most common types of cancer and the most common type of malignancy in women, which has been growing in recent years. Patients with this disease have a chance of recurrence. Many factors reduce or increase this probability. Data mining is one of the methods used to detect or anticipate cancers, and one of its most common uses is to predict the recurrence...
متن کاملAn Algorithm for Predicting Recurrence of Breast Cancer Using Genetic Algorithm and Nearest Neighbor Algorithm
Introduction: Breast cancer is one of the most common types of cancer and the most common type of malignancy in women, which has been growing in recent years. Patients with this disease have a chance of recurrence. Many factors reduce or increase this probability. Data mining is one of the methods used to detect or anticipate cancers, and one of its most common uses is to predict the recurrence...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of molecular biology
دوره 265 4 شماره
صفحات -
تاریخ انتشار 1997